406 research outputs found

    Incorporating prior information into association studies.

    Get PDF
    UnlabelledRecent technological developments in measuring genetic variation have ushered in an era of genome-wide association studies which have discovered many genes involved in human disease. Current methods to perform association studies collect genetic information and compare the frequency of variants in individuals with and without the disease. Standard approaches do not take into account any information on whether or not a given variant is likely to have an effect on the disease. We propose a novel method for computing an association statistic which takes into account prior information. Our method improves both power and resolution by 8% and 27%, respectively, over traditional methods for performing association studies when applied to simulations using the HapMap data. Advantages of our method are that it is as simple to apply to association studies as standard methods, the results of the method are interpretable as the method reports p-values, and the method is optimal in its use of prior information in regards to statistical power.AvailabilityThe method presented herein is available at http://masa.cs.ucla.edu

    Behavior-Based Outlier Detection for Network Access Control Systems

    Get PDF
    Network Access Control (NAC) systems manage the access of new devices into enterprise networks to prevent unauthorised devices from attacking network services. The main difficulty with this approach is that NAC cannot detect abnormal behaviour of devices connected to an enterprise network. These abnormal devices can be detected using outlier detection techniques. Existing outlier detection techniques focus on specific application domains such as fraud, event or system health monitoring. In this paper, we review attacks on Bring Your Own Device (BYOD) enterprise networks as well as existing clustering-based outlier detection algorithms along with their limitations. Importantly, existing techniques can detect outliers, but cannot detect where or which device is causing the abnormal behaviour. We develop a novel behaviour-based outlier detection technique which detects abnormal behaviour according to a device type profile. Based on data analysis with K-means clustering, we build device type profiles using Clustering-based Multivariate Gaussian Outlier Score (CMGOS) and filter out abnormal devices from the device type profile. The experimental results show the applicability of our approach as we can obtain a device type profile for five dell-netbooks, three iPads, two iPhone 3G, two iPhones 4G and Nokia Phones and detect outlying devices within the device type profile

    Combining Strategies for Extracting Relations from Text Collections

    Get PDF
    Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. Our Snowball system extracts these relations from document collections starting with only a handful of user-provided example tuples. Based on these tuples, Snowball generates patterns that are used, in turn, to find more tuples. In this paper we introduce a new pattern and tuple generation scheme for Snowball, with different strengths and weaknesses than those of our original system. We also show preliminary results on how we can combine the two versions of Snowball to extract tuples more accurately

    Multiple testing correction in linear mixed models.

    Get PDF
    BackgroundMultiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM.ResultsWe were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach.ConclusionsWe provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data

    Identification of causal genes for complex traits.

    Get PDF
    MotivationAlthough genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations.ResultsIn this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability ρ. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2.Availability and implementationSoftware is freely available for download at genetics.cs.ucla.edu/caviar

    Assembly of non-unique insertion content using next-generation sequencing

    Get PDF
    Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human genome, however, is a mixture of repeated and unique sequence. Current methods are designed to assemble only unique sequence insertions, using reads that do not map to the reference. These methods are not able to assemble repeated sequence insertions, as the reads will map to the reference in a different locus
    corecore